joint probability distribution
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Alameda County > Oakland (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Reasoning Distillation and Structural Alignment for Improved Code Generation
Jalilifard, Amir, Rocha, Anderson de Rezende, Raimundo, Marcos Medeiros
Effective code generation with language models hinges on two critical factors: accurately understanding the intent of the prompt and generating code that applies algorithmic reasoning to produce correct solutions capable of passing diverse test cases while adhering to the syntax of the target programming language. Unlike other language tasks, code generation requires more than accurate token prediction; it demands comprehension of solution-level and structural relationships rather than merely generating the most likely tokens. very large language model (VLLM) are capable of generating detailed steps toward the correct solution of complex tasks where reasoning is crucial in solving the problem. Such reasoning capabilities may be absent in smaller language models. Therefore, in this work, we distill the reasoning capabilities of a VLLM into a smaller, more efficient model that is faster and cheaper to deploy. Our approach trains the model to emulate the reasoning and problem-solving abilities of the VLLM by learning to identify correct solution pathways and establishing a structural correspondence between problem definitions and potential solutions through a novel method of structure-aware loss optimization. This enables the model to transcend token-level generation and to deeply grasp the overarching structure of solutions for given problems. Experimental results show that our fine-tuned model, developed through a cheap and simple to implement process, significantly outperforms our baseline model in terms of pass@1, average data flow, and average syntax match metrics across the MBPP, MBPP Plus, and HumanEval benchmarks.
- Research Report > New Finding (0.66)
- Research Report > Promising Solution (0.54)
Canonical Representations of Markovian Structural Causal Models: A Framework for Counterfactual Reasoning
Counterfactual reasoning aims at answering contrary-to-fact questions like "Would have Alice recovered had she taken aspirin?" and corresponds to the most fine-grained layer of causation. Critically, while many counterfactual statements cannot be falsified--even by randomized experiments--they underpin fundamental concepts like individual-wise fairness. Therefore, providing models to formalize and implement counterfactual beliefs remains a fundamental scientific problem. In the Markovian setting of Pearl's causal framework, we propose an alternative approach to structural causal models to represent counterfactuals compatible with a given causal graphical model. More precisely, we introduce counterfactual models, also called canonical representations of structural causal models. They enable analysts to choose a counterfactual assumption via random-process probability distributions with preassigned marginals and characterize the counterfactual equivalence class of structural causal models. Using these representations, we present a normalization procedure to disentangle the (arbitrary and unfalsifiable) counterfactual choice from the (typically testable) interventional constraints. In contrast to structural causal models, this allows to implement many counterfactual assumptions while preserving interventional knowledge, and does not require any estimation step at the individual-counterfactual layer: only to make a choice. Finally, we illustrate the specific role of counterfactuals in causality and the benefits of our approach on theoretical and numerical examples.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New York (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.86)
- Health & Medicine > Consumer Health (0.48)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.34)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Alameda County > Oakland (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Physical models realizing the transformer architecture of large language models
The introduction of the transformer architecture in 2017 marked the most striking advancement in natural language processing. The transformer is a model architecture relying entirely on an attention mechanism to draw global dependencies between input and output. However, we believe there is a gap in our theoretical understanding of what the transformer is, and how it works physically. From a physical perspective on modern chips, such as those chips under 28nm, modern intelligent machines should be regarded as open quantum systems beyond conventional statistical systems. Thereby, in this paper, we construct physical models realizing large language models based on a transformer architecture as open quantum systems in the Fock space over the Hilbert space of tokens. Our physical models underlie the transformer architecture for large language models.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Compositional Understanding in Signaling Games
Even when the signalers send compositional messages, the receivers do not interpret them compositionally. When information from one message component is lost or forgotten, the information from other components is also erased. In this paper I construct signaling game models in which genuine compositional understanding evolves. I present two new models: a minimalist receiver who only learns from the atomic messages of a signal, and a generalist receiver who learns from all of the available information. These models are in many ways simpler than previous alternatives, and allow the receivers to learn from the atomic components of messages.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Unfaithful Probability Distributions in Binary Triple of Causality Directed Acyclic Graph
Faithfulness is the foundation of probability distribution and graph in causal discovery and causal inference. In this paper, several unfaithful probability distribution examples are constructed in three--vertices binary causality directed acyclic graph (DAG) structure, which are not faithful to causal DAGs described in J.M.,Robins,et al. Uniform consistency in causal inference. Biometrika (2003),90(3): 491--515. And the general unfaithful probability distribution with multiple independence and conditional independence in binary triple causal DAG is given.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Europe > Czechia > Prague (0.05)
- (7 more...)
A Data-driven Dynamic Temporal Correlation Modeling Framework for Renewable Energy Scenario Generation
Dong, Xiaochong, Liu, Yilin, Zhang, Xuemin, Mei, Shengwei
Renewable energy power is influenced by the atmospheric system, which exhibits nonlinear and time-varying features. To address this, a dynamic temporal correlation modeling framework is proposed for renewable energy scenario generation. A novel decoupled mapping path is employed for joint probability distribution modeling, formulating regression tasks for both marginal distributions and the correlation structure using proper scoring rules to ensure the rationality of the modeling process. The scenario generation process is divided into two stages. Firstly, the dynamic correlation network models temporal correlations based on a dynamic covariance matrix, capturing the time-varying features of renewable energy while enhancing the interpretability of the black-box model. Secondly, the implicit quantile network models the marginal quantile function in a nonparametric, continuous manner, enabling scenario generation through marginal inverse sampling. Experimental results demonstrate that the proposed dynamic correlation quantile network outperforms state-of-the-art methods in quantifying uncertainty and capturing dynamic correlation for short-term renewable energy scenario generation.
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Asia > China > Beijing > Beijing (0.05)
Reviews: Probabilistic Logic Neural Networks for Reasoning
This paper solves the task of knowledge base completion i.e. filling the missing relations between two entities by combining Statistical Relational Model like Markov Logic, and knowledge graph embedding method like TransE. Authors define a set of rules to be used in MLNs and then define a joint probability distribution over the observed and hidden triplets. Similarly, they define a joint probability distribution using KGE approaches (specifically they chose transE model). Then they employ the variational EM algorithm to learn the MLN weights and finally predicting the probabilities of hidden triplets. Originality: I really liked the paper, and enjoyed thoroughly reading it.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.59)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.42)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.39)